Skip to content

Conversation

@Angeladadd
Copy link

@Angeladadd Angeladadd commented Jul 17, 2025

Description

This PR introduces a two-phase optimisation to address communication bottlenecks in the copy_states routine during distributed resampling:

  1. Deduplication & Local Replication:
  • Transmit each unique particle only once across ranks.
  • Reconstruct duplicates locally with multi-threaded replication.
  1. Communication-Efficient Redistribution:
  • Reformulates resampling redistribution as a lightweight rank-level transportation problem.
  • Minimises unnecessary cross-rank transfers caused by global index ordering.
  • Solved efficiently with HiGHS, adding negligible overhead relative to communication savings.

Issue

#116

Testing

  • Added integration test 7 for covering the cases using optimised copy states and resampling function
  • Added mpi test for optimised copy states function
  • Added unit test for optimised resampling function
  • Added slurm scripts for
    • running end-to-end run_particle.jl
    • running mpi test for optimised copy states function

@codecov
Copy link

codecov bot commented Sep 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.22%. Comparing base (e314fbd) to head (8a821d9).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #297      +/-   ##
==========================================
+ Coverage   94.64%   96.22%   +1.57%     
==========================================
  Files           9        9              
  Lines         654      715      +61     
==========================================
+ Hits          619      688      +69     
+ Misses         35       27       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ucabc46 and others added 6 commits September 1, 2025 15:41
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v5)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
try fixing benchmarking

upgrade manifest

add Test(try fixing benchmarking)

let ci refresh manifest

Revert "let ci refresh manifest"

This reverts commit 5385785.

try fixing benchmarking
@Angeladadd Angeladadd force-pushed the cgsun/copy_states_refine branch from 8ac5abb to 273cde9 Compare September 3, 2025 22:54
@Angeladadd Angeladadd marked this pull request as ready for review September 4, 2025 02:56
Copy link
Member

@tkoskela tkoskela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Chenge, I hope you are doing well. Sorry it's taken so long to review your code after marking the report. I think this is very good work and should definitely get merged into the upstream repo.

I'd like to have a few clarifying comments in some places that I've highlighted in my review comments. I think overall the optimisations you made should be the default behaviour, rather than something the user has to switch on. This would especially simplify the copy_states! and copy_states_dedup! functions that duplicate some code at the moment.

If Matt and Mose could also take a look at this, that would be great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this notebook reproduce the plots of your report? It's really nice to have it if it does! It could use some plain text description of what it's doing.

Comment on lines +7 to +11
# Verify BLAS implementation is OpenBLAS
@assert occursin("openblas", string(BLAS.get_config()))

# Set size of thread pool for BLAS operations to 1
BLAS.set_num_threads(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels sensible. We don't want BLAS oversubscribing threads. Could you put a comment here to explain why we require OpenBLAS?

src/params.jl Outdated
particle_save_time_indices::V = []
seed::Union{Nothing, Int} = nothing
n_tasks::Int = -1
optimize_copy_states::Bool = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you demonstrated this works well, I would like to have it true by default.

@@ -1,5 +1,6 @@
[deps]
ChunkSplitters = "ae650224-84b6-46f8-82ea-d812ca08434e"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being used anywhere? Do we need the test for stale dependencies @giordano

src/utils.jl Outdated
Comment on lines 92 to 97
dedup::Bool = false
) where T

if dedup
return copy_states_dedup!(particles, buffer, resampling_indices, my_rank, nprt_per_rank, to)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of code duplication here that is not ideal, and I think is confusing GitHub in showing the diff. I would be happy with replacing the old copy_states! with the new deduplicating version entirely. I think you showed the overhead of removing the duplicates is small in all realistic cases. That would make the code easier to read and maintain in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why the timer object is in the arguments of this function. If I still remember how this works, in the main run_particle_filter function the timer will be updated by the @timeit_debug macro in the calling function and returning it as an argument is redundant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer object is in the arguments because we have a separate testing script for the copy_states! function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants